Coalescence Type based Confidence Warping for Agglutinative Language Keyword Spotting

نویسندگان

  • Ji Xu
  • Yeming Xiao
  • Jielin Pan
  • Yonghong Yan
چکیده

In agglutinative languages like Korean, words are formed by joining l affix morphemes to the stem, which leads to high OOV rate in dictionary building. Hence, subword units are usually used as basic language modeling units in Large-Vocabulary Continuous Speech Recognition (LVCSR) or LVCSR based applications such as keyword spotting. In this work, firstly a new word property called coalescence type is introduced, which is defined based on the result of word segmentation process and thus unique for agglutinative languages. A confidence warping approach is then proposed to adjust confidence measure for keyword candidates, with the additional linguistic level information. An evaluation on Korean telephone speech keyword spotting task shows that up to 2% improvement can be obtained in precision, which is significantly better than the baseline system.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Keyword spotting for highly inflectional languages

This paper presents our new keyword spotting system taking advantage of both the filler model and the confidence measure approaches. The novelty is in a non-standard connection of the filler and the keyword models together with introduction of a new confidence measure based on a keyword normalized score. In detail the paper deals with a decision block. Two methods are introduced. The first is b...

متن کامل

Using phonological phrase segmentation to improve automatic keyword spotting for the highly agglutinating Hungarian language

This paper investigates the usage of prosody for the improvement of keyword spotting, focusing on the highly agglutinating Hungarian language, where keyword spotting cannot be effectively performed using LVCSR, as such systems are either unavailable or hard to operate due to high OOV rates and poor Ngram language modelling capabilities. Therefore, the applied keyword spotting system is based on...

متن کامل

A Piecewise Aggregate Approximation Lower-Bound Estimate for Posteriorgram-Based Dynamic Time Warping

In this paper, we propose a novel lower-bound estimate for dynamic time warping (DTW) methods that use an inner product distance on multi-dimensional posterior probability vectors known as posteriorgrams. Compared to our previous work, the new lower-bound estimate uses piecewise aggregate approximation (PAA) to reduce the time required for calculating the lower-bound estimate. We describe the P...

متن کامل

Spanish Keyword Spotting System Based on Filler Models, Pseudo N-gram Language Model and a Confidence Measure

In order to organize efficiently lots of hours of audio contents such as meetings, radio news, search for spoken keywords is essential. An approach uses filler models to account for non-keyword intervals. Another approach uses a large vocabulary continuous speech recognition system (LVCSR) which retrieves a word string and then search for the keywords in this string. This approach yields high p...

متن کامل

Keyword Spotting with Convolutional Deep Belief Networks and Dynamic Time Warping

To spot keywords on handwritten documents, we present a hybrid keyword spotting system, based on features extracted with Convolutional Deep Belief Networks and using Dynamic Time Warping for word scoring. Features are learned from word images, in an unsupervised manner, using a sliding window to extract horizontal patches. For two single writer historical data sets, it is shown that the propose...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • JSW

دوره 9  شماره 

صفحات  -

تاریخ انتشار 2014